Football is a very exciting sport. Until now, this is the most popular game in the entire Earth planet. Sorry not sorry USA games.

I want to review data collected since 1872 trying to understand how matches between countries have evolved up to this moment. So, we are calling to R and a few libraries to help us visualizing data:

library(tidyverse)
library(plotly)

The first thing is to read files. I downloaded this project at 2021-07-22 from Kaggle.

results<-read.csv("results.csv", encoding = "UTF-8")
shootouts<-read.csv("shootouts.csv", encoding = "UTF-8")

This dataset contains data about \(42k+\) football matches in the history of international encounters between national teams. So, let’s take a little taste of the data:

head(results)
head(shootouts)

One interesting thing is to take a look of the context of the matches, some of them could be not relevant at all, however there is also Worl cup matches, continental tournaments, and so on:

levels(as.factor(results$tournament)) -> tournaments
sample(tournaments,20)
##  [1] "CONCACAF Championship"                     
##  [2] "Windward Islands Tournament"               
##  [3] "Vietnam Independence Cup"                  
##  [4] "Mundialito"                                
##  [5] "CONCACAF Nations League qualification"     
##  [6] "African Nations Championship"              
##  [7] "Balkan Cup"                                
##  [8] "Copa Lipton"                               
##  [9] "SAFF Cup"                                  
## [10] "Copa del Pacífico"                         
## [11] "Korea Cup"                                 
## [12] "Oceania Nations Cup qualification"         
## [13] "FIFA World Cup qualification"              
## [14] "United Arab Emirates Friendship Tournament"
## [15] "ELF Cup"                                   
## [16] "Copa América"                              
## [17] "Rous Cup"                                  
## [18] "Nordic Championship"                       
## [19] "Intercontinental Cup"                      
## [20] "Friendly"

Filtering by tournaments with at least 100 matches played in the history:

results %>% group_by(tournament) %>% summarise(count=n()) %>% filter(count > 100) %>% select(tournament) -> popularCups
results %>% filter(tournament %in% popularCups$tournament) %>% ggplot(aes(x=tournament, fill=tournament)) + geom_bar() + coord_flip() -> p 
ggplotly(p)
results %>%
  mutate(tied=ifelse(home_score == away_score,TRUE,FALSE)) %>%
  mutate(home_points=ifelse(tied == TRUE,1,ifelse(home_score > away_score,3,0))) %>%
  mutate(away_points=ifelse(tied == TRUE,1,ifelse(home_score > away_score,0,3))) -> results

results %>% filter(grepl("FIFA World Cup",tournament)) -> worldCupResults
head(worldCupResults)

Now we need to process a little bit the data:

results %>% pivot_longer(c(home_team,away_team),names_to = "homeaway", values_to = "team") %>% mutate(points=ifelse(grepl("home",homeaway),home_points,away_points), goals=ifelse(grepl("home",homeaway),home_score,away_score),receivedGoals=ifelse(grepl("home",homeaway),away_score,home_score)) %>% select(date,tournament,country,team,points,goals,receivedGoals) -> results

results %>% filter(grepl("FIFA World Cup",tournament)) -> worldCupResults

worldCupResults %>% group_by(team) %>% summarise(p=sum(points),goals=sum(goals),against=sum(receivedGoals),matches=n()) %>% mutate(performance=p/matches,ofensive=goals/matches,defense=against/matches) %>% arrange(desc(performance)) %>% head()
worldCupResults %>% group_by(team) %>% summarise(p=sum(points),goals=sum(goals),against=sum(receivedGoals),matches=n()) %>% mutate(performance=p/matches,ofensive=goals/matches,defense=against/matches) %>% ggplot(aes(x=performance,y=ofensive,size=defense,color=matches,text=team)) + geom_point() + labs(title="Performance vs offensiveness of National teams in matches of World Cup and qualifiers") -> p
ggplotly(p) 
worldCupResults %>% filter(!grepl("qualifi",tournament)) %>% group_by(team) %>% summarise(p=sum(points),goals=sum(goals),against=sum(receivedGoals),matches=n()) %>% mutate(performance=p/matches,ofensive=goals/matches,defense=against/matches) %>% ggplot(aes(x=performance,y=ofensive,size=defense,color=matches,text=team)) + geom_point() + labs(title="Performance vs offensiveness of National Teams in matches of FIFA World Cup") -> p
ggplotly(p) 
results %>% group_by(team) %>% summarise(p=sum(points),goals=sum(goals),against=sum(receivedGoals),matches=n()) %>% mutate(performance=p/matches,ofensive=goals/matches,defense=against/matches) %>% ggplot(aes(x=performance,y=ofensive,size=defense,color=matches,text=team)) + geom_point() + labs(title="Performance vs offensive in all matches") -> p
ggplotly(p)
results %>% group_by(team) %>% summarise(p=sum(points),goals=sum(goals),against=sum(receivedGoals),matches=n()) %>% mutate(performance=p/matches,ofensive=goals/matches,defense=against/matches) %>% ggplot(aes(color=performance,x=ofensive,y=defense,text=paste(team,matches,sep="\n"))) + geom_point() + labs(title="Defense vs offensive in all matches") -> p
ggplotly(p)